Hi, welcome to this second lesson, on data exploration.
In the previous lesson, you had an overview of the EpiGraphHub platform and got introduced to the different menus, pages and general concepts you will encounter when using the platform.
In this lesson, you will explore your first dataset, and create your first chart.
By the end of this lesson,
switzerland.foph_hosp_d) from the Swiss
Federal Office of Public Health (FOPH)To select the dataset that we want to explore, let’s first go to the
Data - Datasets page.
There, you can see the list of all public datasets, and you can click on the name of the dataset you want to explore.
For this lesson, we will use the example of the
foph_hosp_d dataset. This dataset represents the daily
hospitalisations of COVID-19 in Switzerland according to the FOPH
(Federal Office of Public Health).
After selecting the dataset, you will be on the Explore page, which is divided into three different panels.
On the left panel, you can see the currently selected dataset (here
switzerland.foph_hosp_d), and the list of
Columns in the dataset. The data type of the column is
represented by the small icon that is on the left side of each column
name.
In the middle panel, you can choose the visualisation type you want, specify the time granularity (e.g., daily, weekly) and time range (from X to Y) of interest. In this panel, you can also define the query that will be used to build your chart. The query allows you to define which data to include (e.g., Metric fields), how to organise them (e.g., Group field) and how to limit them (e.g., Row limit, Filter fields).
On the right panel, you can see the resulting chart and data generated by the query specified in the middle panel.
We can collapse the left panel to focus on the analysis (middle panel) and visualisation of the data (right panel), selecting the arrow icon highlighted below:
Let’s now have a closer look at the middle panel of the page. It has three different subsections: Chart type, Time, and Query.
By default, the VISUALISATION TYPE is a Table. To see
the different ways you can display your data and potentially change the
chart type, click on the current VISUALISATION TYPE
(Table).
You can see that a large number of other visualisations are available, such as Time-series, Box Plots, Maps, Bar Charts, Line Charts.
To find the visualisation that suits your needs best, you can either explore All charts available, or focus on the most Popular ones, or by Category of interest.
For instance, among Distribution charts, you may
choose between Histograms and Box Plots; among
the Evolution charts category, you can choose between
Line Charts, and different types of
Time-series; the Maps category include
Country Maps and World Maps.
In the Time section, three key fields are defined:
no filter,
last quarter, previous calendar year, or more
custom and advanced time ranges,daily, weekly,
monthly.By default we will now explore all (i.e. no filter on
time) daily data, based on the date column in
our table.
Let’s now have a closer look at the Query section. This section presents two different ways of exploring the data (QUERY MODE) represented by the AGGREGATE and RAW RECORDS options.
In this section, we will see an example with the RAW RECORDS mode, as a first exploration step of our dataset. When this option is selected you will visualise the fields below.
In COLUMNS, you decide what columns of the dataset you want to
visualise. Let’s say we are interested by the date,
geoRegion, entries and sumTotal
columns. To select a column, click on the COLUMNS field, a drop-down
list will appear, with all columns you can select from.
Go through this list, and select your columns of interest (here
date, geoRegion, entries and
sumTotal). The COLUMNS field should look as follows:
After selecting these columns in the COLUMNS field, you will notice
that a button labeled RUN QUERY will appear on the the
right panel of the Explore page.
If you press this button, the table chart with the columns that you selected will appear on the same panel.
Under the table chart, Data resulting from your
query also appear, with the number of data rows retrieved (here
10k rows retrieved).
Once you get the table, if you don’t want to make any new
explorations, you can export this table as a .JSON or
.CSV file by pressing the corresponding button in the right
upper corner.
But, we can still explore many other features, to get an insight on the COVID hospitalisations in Switzerland.
Right below the COLUMNS field, there is a field called ORDERING. In this field, you can order the dataset in ascending or decreasing order according to some column values.
For example, let’s order the dataset by the date
column, in ascending order; click on ORDERING field and
select data [asc].
We can add a second ordering level. For example, for each date, we
can further order the rows by number of entries, in
descending order. To do so, click again on the ORDERING
field, and select entries [desc]. You should then get this
result:
In the ROW LIMIT field you can select how many rows you want to get from the data. We can keep it here to 10000, as follows:
In the FILTERS field you can apply many different filters in the dataset.
Let’s explore some of these filters.
After clicking on + Add filter the following window will
show up on your screen:
column on which
you want to apply the filter. This option can be filled with any column
of the dataset.operator you
want to apply to the data.value of reference for your
filter.We have the following operators available:
equals: You will use this operator to select
only the rows of a dataset with a specific value. The operator equals
can be applied for all types of columns. For example, in the case of the
foph_hosp_d dataset, you can use this filter if you just
want to get the values in which
geoRegion equals GE.
not equals: We can use this operator to avoid
some value. For example, if you don’t want to get the rows with entries
equal a 0 you can apply the following filter:
entries not equals 0.
< : This operator only can be applied in numeric columns. You should use it when you want the values returned in a column to be lower than some threshold defined in the filter value.
> : This operator only can be applied in numeric columns. You should use it when you want the values returned in a column to be bigger than some threshold defined in the filter value.
< = : This operator only can be applied in numeric columns. You should use them when you want the values returned in a column to be lower or equal to a threshold defined in the filter value.
> = : This operator only can be applied in numeric columns. You should use them when you want the values returned in a column to be bigger or equal to a threshold defined in the filter value.
IN : This operator can be used to select values
in a column equal to some group of values. For example, in our example
dataset, you could want to select the rows where the
geoRegion in (GE, FR). Note that you can choose as many
values as you want.
NOT IN: This operator makes the inverse of the
IN operator. This filter is used when you want the table
returned to exclude the values added in the filter value.
LIKE (case insensitive) and LIKE : This operator can be used to select values in columns with the text values following some pattern. The difference between the LIKE (case insensitive) and the LIKE is that the first operator (insensitive) doesn’t differentiate between the upper and lower cases.
IS NOT NULL : This operator is used to select only rows with non-null values according to some column.
IS NULL : This operator is used to select only rows with null values according to some column.
For example, in the case of the foph_hosp_d dataset,
let’s use this filter to get the rows in which
geoRegion equals GE, as follows:
After configuring the filter you need to press the SAVE
button and the RUN QUERY button again. You can see the
result of applying this filter below:
In our platform, we can apply multiple filters at the same time.
For example, in the dataset foph_hosp_d, the column
sumtotal represents the cumulative number of
hospitalisations. Let’ filter the data to keep only rows where the
cumulative number of hospitalisations exceeds 100.
To do so, click on the + icon on the right of
FILTERS field, and add the filter sumtotal >= 100), as
shown below:
And that’s the result:
When you scroll down the table chart, you can already see clearly the first COVID wave, which started end of March 2020, and the more impressive next waves, i.e. around October-November 2020, and beginning of 2022!
It is now time to give a title to our first table
chart. Let’s name it
Daily COVID Hospitalisations in Geneva. To do so, click on
- untitled and repalce it with your new title, as
follows:
Now, you can save this chart for later presentations
and dashboard development by clicking on +SAVE button on
top of the middle panel:
This will open a Save chart pop-up window, where the CHART NAME field will be pre-filled with the title you gave to your chart:
You can also see that there is an ADD TO DASHBOARD field, which allows you to add the chart you are saving to a dashboard.
You can add in this field the name of a dashboard, whether this dashboard already exists or not. If it does not exist, it will create it before adding the current chart to it.
Let’s do that and name our first dashboard
COVID Hospitalisations in Geneva and Switzerland. You can
now click on SAVE.
GREAT! you have just created your first chart and dashboard :D
Let’s go quickly to the Charts menu to check that our table chart has been saved properly.
We can see here that the chart appears as expected in the list of
saved charts, with the title we just gave it, type of chart
Table, dataset switzerland.foph_hosp_d, and
that it was just created and modified by myself logged as
Aziza Merzouki.
Same for the dashboard, let’s go to the Dashboards
menu to check that our
COVID Hospitalisations in Geneva and Switzerland has been
created and saved properly.
Notice that the Status of our dashboard is
Draft. In the next lesson, we will see how we can publish
it, once it is ready to be shared with all our colleagues :)
If you click on your dashboard, you should see the following:
Congratulations! You reached the end of this second lesson, where you explored your first dataset, created your first table chart, and started building your first dashboard.
Hopefully, you are now more comfortable with the Explore Page of the EpiGraphHub, its different panels, and its main fields to explore datasets and create charts.
In the next lesson you will create new types of chart and finalize building your dashboard. See you there :)
The following team members contributed to this lesson:
This work is licensed under the Creative Commons Attribution Share Alike license.